-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor parameter tolerance handling #114
Conversation
Before, in-range values were defined by the distance to the closest parameter point. Now, everything inside the convex hull is automatically considered in range.
@Scienfitz @AVHopp: This is my first proposal for the tolerance implementation. Note that the new version does NOT (yet) include a second tolerance value, i.e. it does not distinguish between the "matching tolerance" and the "out of convex hull tolerance". The reason is simply because I'm not yet sure if the other approach adds too much complexity for a user, and perhaps we should postpone the second tolerance discussion until we have finally fixed the scaling (which is affected by that decision). With the current minimal implementation, users can still do anything they want. That is, small deviations are automatically accounted for due to the new tolerance default value, and for larger deviations, the can actively decide to ignore the errors by setting the corresponding flag. But let me know what you think |
) | ||
if not param.is_in_range((val := row[param.name])): | ||
if param.is_numeric and on_tolerance_violation == "ignore": | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt this be a continue
? still need to checkt he other parameters in the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also think that this needs to be continue
f"The value '{val}' is outside the range of parameter " | ||
f"'{param.name}'. " | ||
f"If you expected a match between your input " | ||
f"and the parameter, consider increasing the parameter's " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error is shown for both numerical and non-numerical parameters, hence should make the possible distinct situations clearer. Eg a categorical parameter has no tolerance
f"and the parameter, consider increasing the parameter's " | ||
f"tolerance value or adding more parameter values. " | ||
f"You can bypass this check using the 'ignore' or 'warn' mode." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i cant comment on the line, but we shouldnt forget about whats below here in line 411+
- it still uses old logger mechanism
- the warnings there should also be controlled by on_tolerance_validation. eg no need to print a no match found warning if its set to ignore
@@ -159,7 +159,7 @@ def validate_config(cls, config_json: str) -> None: | |||
def add_measurements( | |||
self, | |||
data: pd.DataFrame, | |||
numerical_measurements_must_be_within_tolerance: bool = True, | |||
on_tolerance_violation: Literal["raise", "warn", "ignore"] = "raise", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since these violations are not an issue for scaling, I would go with warn
as default, not raise
. This is also needed in the API I think.
The user can explicitly still disallow violations but setting raise
which would be similar to explicitly setting the old numerical_measurements_must_be_within_tolerance
to True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly minor changes.
@@ -159,7 +159,7 @@ def validate_config(cls, config_json: str) -> None: | |||
def add_measurements( | |||
self, | |||
data: pd.DataFrame, | |||
numerical_measurements_must_be_within_tolerance: bool = True, | |||
on_tolerance_violation: Literal["raise", "warn", "ignore"] = "raise", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
@@ -226,12 +228,13 @@ def add_measurements( | |||
) | |||
|
|||
# Telemetry | |||
# TODO: Code is inefficient because of unnecessary second fuzzy matching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which unnecessary second fuzzy matching are you talking about here? Is this something happening within one of the functions?
) | ||
@tolerance.default | ||
def default_tolerance(self) -> float: | ||
"""Set the tolerance to fraction of the smallest value distance.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads as if the fraction
can be chosen by the user. Also, we should add the information in the docstring that this fraction is set to 0.1
parameters are matched with the search space elements only if there is a | ||
match within the parameter tolerance. If ``False``, the closest match is | ||
considered, irrespective of the distance. | ||
on_tolerance_violation: The mode determining what how to handle a missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either add some words about the different modes here or refer to the place where you described them.
) | ||
if not param.is_in_range((val := row[param.name])): | ||
if param.is_numeric and on_tolerance_violation == "ignore": | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also think that this needs to be continue
Closing this PR due to inactivity |
This PR refactors the tolerance implementation of numerical discrete parameters:
add_measurements
now takes anon_tolerance_violation
parameter which allow to control to behavior when encountering out-of-range measurementsSo overall, the mechanism remains similar to the previous one but more user-friendly.